RCrawler: An R package for parallel web crawling and scraping
نویسندگان
چکیده
منابع مشابه
An Extended Model for Effective Migrating Parallel Web Crawling with Domain Specific and Incremental Crawling
The size of the internet is large and it had grown enormously search engines are the tools for Web site navigation and search. Search engines maintain indices for web documents and provide search facilities by continuously downloading Web pages for processing. This process of downloading web pages is known as web crawling. In this paper we propose the architecture for Effective Migrating Parall...
متن کاملData-Parallel Web Crawling Models
The need to quickly locate, gather, and store the vast amount of material in the Web necessitates parallel computing. In this paper, we propose two models, based on multi-constraint graph-partitioning, for efficient data-parallel Web crawling. The models aim to balance the amount of data downloaded and stored by each processor as well as balancing the number of page requests made by the process...
متن کاملAn extended model for effective migrating parallel web crawling with domain specific crawling
The size of the internet is large and it had grown enormously search engines are the tools for Web site navigation and search. Search engines maintain indices for web documents and provide search facilities by continuously downloading Web pages for processing. This process of downloading web pages is known as web crawling. In this paper we propose the architecture for Effective Migrating Parall...
متن کاملhaploR: an R package for querying web-based annotation tools
We developed haploR, an R package for querying web based genome annotation tools HaploReg and RegulomeDB. haploR gathers information in a data frame which is suitable for downstream bioinformatic analyses. This will facilitate post-genome wide association studies streamline analysis for rapid discovery and interpretation of genetic associations.
متن کاملhaploR: an R-package for querying web-based annotation tools
There exists a set of web-based tools for integration and exploring information linked to annotated genetic variants. We developed , an R-package for haploR querying such web-based genome annotation tools (currently implementing on HaploReg and RegulomeDB) and gathering information in a format suitable for downstream bioinformatic analyses. This will facilitate post-genome wide association stu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: SoftwareX
سال: 2017
ISSN: 2352-7110
DOI: 10.1016/j.softx.2017.04.004